SAT Scores and Equity: A Deep Dive into Racial and Socioeconomic Disparities¶

Introduction¶

In the realm of education, the United States has long been regarded as a land of opportunity, where individuals from diverse backgrounds can pursue their dreams through higher education. However, beneath this narrative of equal opportunity lies a complex web of systemic disparities that significantly affect educational outcomes for various demographic groups. One of the critical barometers of these disparities is the Scholastic Assessment Test (SAT), a standardized test widely used in the college admissions process. The SAT serves as a pivotal component in the higher education landscape, influencing not only individual academic trajectories but also broader discussions about access, equity, and educational policy in the United States.

The American education system, while founded on principles of inclusivity and meritocracy, grapples with stark inequalities. These disparities are deeply intertwined with socioeconomic status, geographic location, and racial identity, among other factors. Historically, SAT scores have been an instrument to assess academic preparedness for college, but they have also been criticized for their potential to perpetuate these disparities, particularly among different racial and ethnic groups. This personal project aims to delve into the intricacies of SAT scores, using data collected from the College Board SAT Annual Reports.Through exploratory data analysis the following questions will be addressed:

  1. How do average SAT scores vary among different racial and ethnic groups in the United States?
  2. Have there been any notable changes in these disparities over the past decade?
  3. How does family income correlate with SAT performance, and are there disparities in access to test preparation resources based on socioeconomic status?
  4. Are certain groups more likely to face barriers or advantages in the college admissions process due to their SAT scores?

By analyzing historical data and employing statistical techniques, the objective of this project is to employ data visualization techniques to dissect the SAT results, uncover underlying patterns, and shed light on how these disparities manifest within the American education system.

Data collected from the College Board SAT Suite Annual Reports References: https://www.statista.com/statistics/233324/median-household-income-in-the-united-states-by-race-or-ethnic-group/

Data Preparation¶

In [1]:
# Import the Necessary Libraries 
import pandas as pd 
pd.options.mode.chained_assignment = None 
import plotly.express as px
from plotly.subplots import make_subplots
import plotly.graph_objects as go 
import numpy as np
import tensorflow as tf
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder, StandardScaler
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
In [2]:
# Import the Exel Files to Be Read 
df_2022 = pd.read_excel('2022 SAT Data.xlsx')
df_2021 = pd.read_excel('2021 SAT Data.xlsx')
df_2020 = pd.read_excel('2020 SAT Data.xlsx')
df_2019 = pd.read_excel('2019 SAT Data.xlsx')
df_2018 = pd.read_excel('2018 SAT Data.xlsx')
df_2017 = pd.read_excel('2017 SAT Data.xlsx')
df_2016 = pd.read_excel('2016 SAT Data.xlsx')
In [3]:
# 2022 SAT Data Frame: Race/Ethnicity, Parental Education, Mean Family Income
df_ethnicity_2022 = df_2022.iloc[5:13,:]
df_ethnicity_2022.columns=['Race/Ethnicity','Number of Test Takers','Percent','Mean Total Score','Mean ERW Score','Mean Math Score','Met Both Benchmarks','Met ERW Benchmark','Met Math Benchmark','Met No Benchmarks']
df_ethnicity_2022['Year']= '2022'


df_parent_education_2022 = df_2022.iloc[27:33,:]
df_parent_education_2022.columns=['Race/Ethnicity','Number of Test Takers','Percent','Mean Total Score','Mean ERW Score','Mean Math Score','Met Both Benchmarks','Met ERW Benchmark','Met Math Benchmark','Met No Benchmarks']
df_parent_education_2022['Year']= '2022'

df_family_income_2022 = df_2022.iloc[35:40,:]
df_family_income_2022.columns=['Mean Family Income','Number of Test Takers','Percent','Mean Total Score','Mean ERW Score','Mean Math Score','Met Both Benchmarks','Met ERW Benchmark','Met Math Benchmark','Met No Benchmarks']
df_family_income_2022['Year']= '2022'
In [4]:
# 2021 SAT Data Frame: Race/Ethnicity and Parental Education
df_ethnicity_2021 = df_2021.iloc[5:13,:]
df_ethnicity_2021.columns=['Race/Ethnicity','Number of Test Takers','Percent','Mean Total Score','Mean ERW Score','Mean Math Score','Met Both Benchmarks','Met ERW Benchmark','Met Math Benchmark','Met No Benchmarks']
df_ethnicity_2021['Year']= '2021'


df_parent_education_2021 = df_2021.iloc[26:32,:]
df_parent_education_2021.columns=['Race/Ethnicity','Number of Test Takers','Percent','Mean Total Score','Mean ERW Score','Mean Math Score','Met Both Benchmarks','Met ERW Benchmark','Met Math Benchmark','Met No Benchmarks']
df_parent_education_2021['Year']= '2021'
In [5]:
# 2020 SAT Data Frame: Race/Ethnicity and Parental Education
df_ethnicity_2020 = df_2020.iloc[5:13,:]
df_ethnicity_2020.columns=['Race/Ethnicity','Number of Test Takers','Percent','Mean Total Score','Mean ERW Score','Mean Math Score','Met Both Benchmarks','Met ERW Benchmark','Met Math Benchmark','Met No Benchmarks']
df_ethnicity_2020['Year']= '2020'


df_parent_education_2020 = df_2020.iloc[26:32,:]
df_parent_education_2020.columns=['Race/Ethnicity','Number of Test Takers','Percent','Mean Total Score','Mean ERW Score','Mean Math Score','Met Both Benchmarks','Met ERW Benchmark','Met Math Benchmark','Met No Benchmarks']
df_parent_education_2020['Year']= '2020'
In [6]:
# 2019 SAT Data Frame: Race/Ethnicity and Parental Education
df_ethnicity_2019 = df_2019.iloc[5:13,:]
df_ethnicity_2019.columns=['Race/Ethnicity','Number of Test Takers','Percent','Mean Total Score','Mean ERW Score','Mean Math Score','Met Both Benchmarks','Met ERW Benchmark','Met Math Benchmark','Met No Benchmarks']
df_ethnicity_2019['Year']= '2019'


df_parent_education_2019 = df_2019.iloc[26:32,:]
df_parent_education_2019.columns=['Race/Ethnicity','Number of Test Takers','Percent','Mean Total Score','Mean ERW Score','Mean Math Score','Met Both Benchmarks','Met ERW Benchmark','Met Math Benchmark','Met No Benchmarks']
df_parent_education_2019['Year']= '2019'
In [7]:
# 2018 SAT Data Frame: Race/Ethnicity and Parental Education
df_ethnicity_2018 = df_2018.iloc[5:13,:]
df_ethnicity_2018.columns=['Race/Ethnicity','Number of Test Takers','Percent','Mean Total Score','Mean ERW Score','Mean Math Score','Met Both Benchmarks','Met ERW Benchmark','Met Math Benchmark','Met No Benchmarks']
df_ethnicity_2018['Year']= '2018'


df_parent_education_2018 = df_2018.iloc[26:32,:]
df_parent_education_2018.columns=['Race/Ethnicity','Number of Test Takers','Percent','Mean Total Score','Mean ERW Score','Mean Math Score','Met Both Benchmarks','Met ERW Benchmark','Met Math Benchmark','Met No Benchmarks']
df_parent_education_2018['Year']= '2018'
In [8]:
# 2017 SAT Data Frame: Race/Ethnicity and Parental Education
df_ethnicity_2017 = df_2017.iloc[5:13,:]
df_ethnicity_2017.columns=['Race/Ethnicity','Number of Test Takers','Percent','Mean Total Score','Mean ERW Score','Mean Math Score','Met Both Benchmarks','Met ERW Benchmark','Met Math Benchmark','Met No Benchmarks']
df_ethnicity_2017['Year']= '2017'


df_parent_education_2017 = df_2017.iloc[25:31,:]
df_parent_education_2017.columns=['Race/Ethnicity','Number of Test Takers','Percent','Mean Total Score','Mean ERW Score','Mean Math Score','Met Both Benchmarks','Met ERW Benchmark','Met Math Benchmark','Met No Benchmarks']
df_parent_education_2017['Year']= '2017'
In [9]:
# 2016 SAT Data Frame: Race/Ethnicity, Parental Education, Mean Family Income
df_ethnicity_2016 = df_2016.iloc[0:9,:]
df_ethnicity_2016.columns=['Race/Ethnicity','Number of Test Takers','Percent','Mean ERW Score','Mean Math Score']
df_ethnicity_2016['Mean Total Score']= df_ethnicity_2016['Mean ERW Score'] + df_ethnicity_2016['Mean Math Score']
df_ethnicity_2016['Year']= '2016'
df_ethnicity_2016 = df_ethnicity_2016[['Race/Ethnicity','Number of Test Takers','Percent','Mean Total Score','Mean ERW Score','Mean Math Score','Year']]

df_parent_education_2016 = df_2016.iloc[21:27,:]
df_parent_education_2016.columns=['Race/Ethnicity','Number of Test Takers','Percent','Mean ERW Score','Mean Math Score']
df_parent_education_2016['Mean Total Score']= df_parent_education_2016['Mean ERW Score'] + df_ethnicity_2016['Mean Math Score']
df_parent_education_2016['Year']= '2016'
df_parent_education_2016 = df_ethnicity_2016[['Race/Ethnicity','Number of Test Takers','Percent','Mean Total Score','Mean ERW Score','Mean Math Score','Year']]


df_family_income_2016 = df_2016.iloc[11:20,:]
df_family_income_2016.columns=['Race/Ethnicity','Number of Test Takers','Percent','Mean ERW Score','Mean Math Score']
df_family_income_2016['Mean Total Score']= df_family_income_2016['Mean ERW Score'] + df_family_income_2016['Mean Math Score']
df_family_income_2016['Year']= '2016'
df_family_income_2016 = df_family_income_2016[['Race/Ethnicity','Number of Test Takers','Percent','Mean Total Score','Mean ERW Score','Mean Math Score','Year']]
In [10]:
# Data Frame for the 2021 U.S Median Household Income 
US_medianincome2021 = { 'Race/Ethnicity': ['American Indian and Alaska Native', 'Asian','Black/African American', 'Hispanic/Latino','White'],
                    'Median Household Income': ['49216','101418','45208','57981','71033']}
df_US_US_medianincome2021= pd.DataFrame(US_medianincome2021)
In [11]:
# Combine Data Frames
df_ethnicity = pd.concat([df_ethnicity_2022,df_ethnicity_2021,df_ethnicity_2020,df_ethnicity_2019,df_ethnicity_2018,df_ethnicity_2017,df_ethnicity_2016],ignore_index=True)
df_parent_education = pd.concat([df_parent_education_2022,df_parent_education_2021,df_parent_education_2020,df_parent_education_2019,df_parent_education_2018,df_parent_education_2017,df_parent_education_2016],ignore_index=True)

df_ethnicity
Out[11]:
Race/Ethnicity Number of Test Takers Percent Mean Total Score Mean ERW Score Mean Math Score Met Both Benchmarks Met ERW Benchmark Met Math Benchmark Met No Benchmarks Year
0 American Indian/Alaska Native 14,800 1% 936 473 463 22% 44% 24% 54% 2022
1 Asian 175,468 10% 1229 596 633 75% 84% 80% 11% 2022
2 Black/African American 201,645 12% 926 474 452 19% 44% 21% 54% 2022
3 Hispanic/Latino 396,422 23% 964 491 473 26% 52% 28% 47% 2022
4 Native Hawaiian/Other Pacific Islander 3,376 0% 945 481 464 24% 47% 26% 51% 2022
5 White 732,946 42% 1098 556 543 53% 77% 55% 21% 2022
6 Two or More Races 66,702 4% 1102 559 543 52% 77% 54% 22% 2022
7 No Response 146,319 8% 983 489 494 31% 47% 35% 49% 2022
8 American Indian/Alaska Native 10,288 1% 927 468 459 21% 42% 24% 56% 2021
9 Asian 167,208 11% 1239 597 642 78% 85% 83% 10% 2021
10 Black/African American 168,454 11% 934 477 457 22% 46% 23% 53% 2021
11 Hispanic/Latino 352,094 23% 967 490 477 28% 52% 30% 46% 2021
12 Native Hawaiian/Other Pacific Islander 3,015 0% 950 481 469 26% 48% 28% 50% 2021
13 White 635,486 42% 1112 562 550 57% 80% 59% 18% 2021
14 Two or More Races 54,961 4% 1116 565 551 56% 79% 58% 20% 2021
15 No Response 117,627 8% 976 483 493 31% 46% 36% 49% 2021
16 American Indian/Alaska Native 14,050 1% 902 456 447 17% 38% 20% 60% 2020
17 Asian 223,451 10% 1217 585 632 74% 83% 80% 11% 2020
18 Black/African American 261,326 12% 927 473 454 20% 44% 21% 54% 2020
19 Hispanic/Latino 569,370 26% 969 491 478 28% 53% 30% 45% 2020
20 Native Hawaiian/Other Pacific Islander 5,107 0% 948 478 470 24% 47% 27% 50% 2020
21 White 909,987 41% 1104 557 547 56% 79% 59% 19% 2020
22 Two or More Races 89,656 4% 1091 552 539 52% 76% 53% 22% 2020
23 No Response 125,513 6% 996 488 507 36% 49% 41% 45% 2020
24 American Indian/Alaska Native 12,917 1% 912 461 451 18% 39% 21% 58% 2019
25 Asian 228,527 10% 1223 586 637 75% 83% 80% 11% 2019
26 Black/African American 271,178 12% 933 476 457 20% 46% 22% 53% 2019
27 Hispanic/Latino 554,665 25% 978 495 483 29% 55% 31% 43% 2019
28 Native Hawaiian/Other Pacific Islander 5,430 0% 964 487 478 27% 51% 29% 47% 2019
29 White 947,842 43% 1114 562 553 57% 80% 59% 18% 2019
30 Two or More Races 87,178 4% 1095 554 540 51% 76% 53% 22% 2019
31 No Response 112,350 5% 959 472 487 28% 44% 34% 50% 2019
32 American Indian/Alaska Native 10,946 1% 949 480 469 24% 48% 26% 50% 2018
33 Asian 217,971 10% 1223 588 635 75% 85% 81% 10% 2018
34 Black/African American 263,318 12% 946 483 463 21% 50% 23% 49% 2018
35 Hispanic/Latino 499,442 23% 990 501 489 31% 58% 33% 40% 2018
36 Native Hawaiian/Other Pacific Islander 5,620 0% 986 498 489 31% 57% 33% 40% 2018
37 White 930,825 44% 1123 566 557 59% 82% 61% 16% 2018
38 Two or More Races 77,078 4% 1101 558 543 52% 78% 54% 20% 2018
39 No Response 131,339 6% 954 472 481 26% 44% 31% 51% 2018
40 American Indian/Alaska Native 7,782 0% 963 486 477 27% 53% 29% 45% 2017
41 Asian 158,031 9% 1181 569 612 70% 81% 76% 12% 2017
42 Black/African American 225,860 13% 941 479 462 20% 49% 22% 50% 2017
43 Hispanic/Latino 408,067 24% 990 500 489 31% 58% 33% 39% 2017
44 Native Hawaiian/Other Pacific Islander 4,131 0% 986 498 488 32% 57% 34% 40% 2017
45 White 760,362 44% 1118 565 553 59% 83% 61% 15% 2017
46 Two or More Races 57,049 3% 1103 560 544 54% 80% 56% 18% 2017
47 No Response 94,199 5% 961 475 485 27% 48% 33% 47% 2017
48 American Indian/Alaska Native 7778.0 0.0 939.0 468.0 471.0 NaN NaN NaN NaN 2016
49 Asian 196735.0 12.0 1131.0 529.0 602.0 NaN NaN NaN NaN 2016
50 Black/African American 199306.0 12.0 855.0 430.0 425.0 NaN NaN NaN NaN 2016
51 Native Hawaiian/Other Pacific Islander 2371.0 0.0 870.0 432.0 438.0 NaN NaN NaN NaN 2016
52 Hispanic/Latino 355829.0 22.0 901.0 448.0 453.0 NaN NaN NaN NaN 2016
53 White 742436.0 45.0 1061.0 528.0 533.0 NaN NaN NaN NaN 2016
54 Two or More Races 28460.0 2.0 1016.0 511.0 505.0 NaN NaN NaN NaN 2016
55 Other 20604.0 1.0 1015.0 496.0 519.0 NaN NaN NaN NaN 2016
56 No Response 84070.0 5.0 952.0 451.0 501.0 NaN NaN NaN NaN 2016

Exploratory Data Analysis¶

In [12]:
fig = px.line(df_ethnicity, x='Year', y='Mean Total Score',color='Race/Ethnicity', markers=True)
fig.update_layout(autotypenumbers='convert types',
                 title_text = 'Examination of 2016-2022 SAT Scores and Racial/Ethnic Factors')
fig.show()

An examination of the mean total SAT scores over the last seven years reveals a consistent pattern: Asians consistently achieve the highest scores, followed by Whites, while Black/African Americans and American Indian/Alaska Natives consistently score lower.

In [13]:
# Exploring the Influence of Race and Ethnicity on SAT Scores: A Comparative Analysis
fig_score_by_race = px.line(df_ethnicity_2022, x='Race/Ethnicity', y = ['Mean Total Score'], markers=True)
fig_score_by_race.update_layout(autotypenumbers='convert types', # Updates the values from the dataframe from type object to numeric values
                 title_text = 'Examination of 2022 SAT Scores and Racial/Ethnic Factors', #Title of the Plot
                 xaxis_title = 'Race/Ethnicity', #x-axis label
                 legend_title = 'Legend',           
                 yaxis_title = 'SAT Scores') #y-axis label # Updates the values from the dataframe from type object to numeric values

fig_score_by_race.show()

Analyzing mean total SAT score data only from 2022, the disparities between races are even more apparent. Asians achieved the highest mean score at 1229, falling within the 75-81 percentile range, while Whites followed as the second-highest scoring group with a mean score of 1098, within the 51-61 percentile range. On the other hand, Black/African Americans and American Indian/Alaska Natives scored significantly lower, with mean scores of 926 and 936, respectively, falling within the 27-35 percentile range.

In [18]:
# Exploring the Influence of Race and Ethnicity on ERW and Math SAT Scores: A Comparative Analysis
fig_score_by_race = px.line(df_ethnicity_2022, x='Race/Ethnicity', y = ['Mean ERW Score','Mean Math Score'], markers=True)
fig_score_by_race.update_layout(autotypenumbers='convert types', # Updates the values from the dataframe from type object to numeric values
                 title_text = 'Examination of 2022 SAT ERW and Math Scores and Racial/Ethnic Factors', #Title of the Plot
                 xaxis_title = 'Race/Ethnicity', #x-axis label
                 legend_title = 'Legend',           
                 yaxis_title = 'SAT Scores') #y-axis label # Updates the values from the dataframe from type object to numeric values
fig_score_by_race.update_traces(textposition='top center')
fig_score_by_race.show()
In [15]:
fig_gender_bar = px.histogram(df_ethnicity_2022,x='Race/Ethnicity',y = ['Mean Total Score','Mean ERW Score','Mean Math Score'],text_auto=True,barmode='group')
fig_gender_bar.update_layout(title_text = 'Exploring the Influence of Race and Ethnicity on 2022 SAT Scores', #Title of the Plot
                  legend_title = 'Legend',
                  xaxis_title = 'Race/Ethnicity', #x-axis label
                  yaxis_title = 'SAT Scores', #y-axis label
                  barmode='group', 
                  bargap=0.01, #Gap between bars of adjacent location
                  bargroupgap=0.01) #Gap between bars of the same location coordinates

fig_gender_bar.show()

In 2022, an intriguing finding emerged when examining mean SAT scores for the Evidence-Based Reading and Writing (ERW) and Math sections among different racial and ethnic groups. Asians were the only group to score higher in Math than in ERW, suggesting a particular strength in mathematical aptitude. Conversely, for all other racial groups, including Whites, Blacks/African Americans, Hispanics/Latinos and American Indian/Alaska Natives, the scores in ERW were higher than those in Math. Notably, Black/African Americans recorded the lowest mean scores across both sections, with an average of 452 in math and 474 in reading and writing, underlining the ongoing challenges in addressing educational disparities.

In [16]:
# Analysis of Influence of Median Family Income on SAT Scores 
fig_race_line = px.line(df_family_income_2022, x='Mean Family Income', y = ['Mean Total Score'], markers=True)
fig_race_line.update_layout(autotypenumbers='convert types', # Updates the values from the dataframe from type object to numeric values
                 title_text = 'Exploring the Influence of Median Family Income on 2022 SAT Scores', #Title of the Plot
                 xaxis_title = 'Median Family Income', #x-axis label
                 legend_title = 'Legend',           
                 yaxis_title = 'SAT Scores') #y-axis label # Updates the values from the dataframe from type object to numeric values

fig_race_line.show()

The analysis of the influence of median family income on 2022 SAT scores reveals a robust positive correlation, indicating that as median family income increases, so do the mean total SAT scores. This strong correlation underscores the role of socioeconomic factors in educational achievement, highlighting the advantages that come with higher family income levels.

In [17]:
fig_gender_bar = px.histogram(df_US_US_medianincome2021,x='Race/Ethnicity',y = ['Median Household Income'],text_auto=True,barmode='group')
fig_gender_bar.update_layout(title_text = 'Median Household Income in the United States in 2021 by Race/Ethnicity', #Title of the Plot
                  legend_title = 'Legend',
                  xaxis_title = 'Year', #x-axis label
                  yaxis_title = 'Median Household Income', #y-axis label
                  barmode='group', 
                  bargap=0.01, #Gap between bars of adjacent location
                  bargroupgap=0.01) #Gap between bars of the same location coordinates

fig_gender_bar.show()

The 2021 median household income data by race/ethnicity further solidifies the link between socioeconomic factors and educational outcomes. Asians ranked the highest with a median family income of \$101,418, aligning with their consistently high SAT scores. In contrast, Black/African Americans, who had the lowest median family income at \\$45,208, are among the racial groups with the lowest SAT scores, demonstrating the profound impact of economic disparities on educational achievements.

Conclusion¶

The analysis of SAT scores across racial and ethnic groups in the United States reveals consistent disparities. Asians consistently score highest, followed by Whites, while Black/African Americans and American Indian/Alaska Natives consistently score lower, indicating that race and ethnicity significantly influence SAT performance. Over the past decade, these disparities have shown little change, highlighting persistent systemic challenges. Family income strongly correlates with SAT performance, with higher incomes associated with higher scores. This income disparity underscores the role of socioeconomic status in educational outcomes and raises concerns about unequal access to test preparation resources. These findings have implications for college admissions, as students from racial groups with lower average scores may face barriers, emphasizing the need for holistic admissions criteria and comprehensive efforts to promote educational equity.

In conclusion, this report underscores the need for systemic solutions to address disparities in the American education system. Initiatives focusing on equitable access to quality education, increased investment in underserved communities, culturally responsive teaching practices, and affordable test preparation resources are vital steps toward achieving educational equity. The college admissions process must also consider these disparities and adapt to create a more inclusive and fair environment for all students.

In [ ]: